Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Although we make
conv2d_cudnn.cuda
anddense_cublas.cuda
as AutoTVM tasks so that they can be "tuned" and compared with other implementations, there have some issues prevent us from actually "tuning" them.conv2d_cudnn.cuda
: I constantly got the following errors incudnnFindConvolutionForwardAlgorithm
(on T4 GPU). Note that this function is called when extracting tasks without issues. I guess it might be the issue of CUDA context and threading.The solution in this PR is to create a knob so that it becomes a template with 8 candidates. In this case,
cudnnFindConvolutionForwardAlgorithm
will not be called during tuning and everything works well. We still set the knob value to-1
in the fallback config to achieve the same behavoir as now.dense_cblas.cuda
: The error comes from the callback function that tries to display the FLOPS. The reason is thattask.flops
isFloatImm
instead offloat
, sofloat(flops)
will throw type error. This PR letsadd_flop
function supportFloatImm
andIntImm
types.cc @icemelon9 @merrymercy